Sampling Interval Analysis

Author

Kyle Nessen

Published

March 21, 2025

Background

This report was generated at the request of Francis to answer the question of what is an appropriate sampling interval for the Vandenberg space-for-space camera study. The goal here is to empirically interrogate if subsampling the photo data is functionally equivalent to the full resolution of images. My motivation for sub-sampling is that labeling these photos is labor-intensive, and I suspect that such a high frequency of 10-minute intervals is introducing unneeded noise. And so if we can reduce the number of images that we label, we can process these images much sooner. Even switching from 30 minute intervals from 10 minute intervals will reduce the total number of images we have to process by three times.However, we should be careful that we’re not throwing away necessary information to answer the question at hand, which this report is intended to shed some light on.

Data Preperation

I won’t go into the full detail of how I prepared and cleaned the data, but instead I’ll just give a quick summary to provide additional context for the results that will follow. With that said, I have all the code and the full analysis script that I’m happy to share if someone reading this would like to see the specifics.

Monarch Count Data Preparation

  • Loaded SC1 and SC2 datasets with 5-minute intervals; reduced to 10-minute intervals by keeping every other row.
    • These two deployments are the only in the full dataset with this interval, which is why I chose to reduce to 10 minutes.
  • Parsed timestamps and sorted data by deployment and time.
  • Appended SC4 data and re-sorted the full 10 min dataset.

Nighttime Data Removal

  • Defined specific night periods for each deployment where labelers copy and pasted last valid count value
  • Removed observations during these periods to avoid duplicated or unreliable counts.

Subsampling

  • Created datasets at 10, 20, 30, 60, 90, and 120-minute intervals by retaining every nth row within each deployment group.

Monarch Change Calculation

  • Calculated changes in butterfly counts between time points.
  • Replaced zero changes with 0.1 and applied a signed log transformation to prepare for modeling.
    • This was done based on initial modeling and poor performance. Log transforming helped.

Wind Data Integration

  • Loaded wind data and aligned it with butterfly observation intervals based on deployment and timestamps.
  • Calculated average wind speed, average and max gusts, and number of wind observations per interval.

Modeling

based on my conversation with Francis we decided to try a information theoritic approach. essentially build the sub-sampled datasets, run the same model on each, and compare AIC scores. Below is some information on how I did that.

Model Preparation

  • Defined monarch_change as the response variable (log transformed).
  • Selected wind metrics (avg_wind_speed_mph, avg_wind_gust_mph, max_wind_gust_mph) as fixed effects.
  • Included deployment_id as a random effect to account for variation between deployments.
  • Scaled all wind predictors for comparability.
  • Converted timestamps to numeric values to model temporal autocorrelation.

Model Fitting

  • Used lme() from the nlme package to fit a linear mixed-effects model.
  • Included a first-order autoregressive structure (corAR1) to account for temporal correlation within deployments.
  • Applied the model to each subsampled dataset (10 to 120 minutes).
  • Captured model convergence status, AIC, fixed and random effects, and autocorrelation parameter.

AIC Comparison

Below is the AIC scores for each sub-sample dataset along with a chart. It’s clear from these results that the longest interval performs the best model. But I’m cautious to take these interpretations because I think we are misusing AIC as a tool of comparison because we are effectively changing the data set for each model. I believe the AIC score is very sensitive to the number of data points, which may explain why we see such a predictable decrease as we increase the interval.

Model comparison across different sampling intervals
Interval (minutes) AIC Score Number of Observations Model Converged
10 1343.45 298 TRUE
20 682.85 149 TRUE
30 468.60 99 TRUE
60 234.91 50 TRUE
90 156.23 33 TRUE
120 116.14 25 TRUE

AIC scores for all sub-sampled datasets.

30 Min Model Results

Out of curiosity, I wanted to see how some of these models were performing and if we can glean any insight from them. My intuition keeps pointing me towards 30-minute intervals. And in my initial exploratory analysis, the 30-minute model seemed to have the best performance, but by no means was it perfect.

Below is the performance model output showing that there’s some issues, but we’re getting close to a defensible model. And below that is the raw output of the results from that model.

Model Performance Diagnostics
Linear mixed-effects model fit by REML
  Data: model_data 
       AIC      BIC    logLik
  468.6036 486.4807 -227.3018

Random effects:
 Formula: ~1 | deployment_id
         (Intercept) Residual
StdDev: 9.468284e-05 2.471424

Correlation Structure: ARMA(1,0)
 Formula: ~time_numeric | deployment_id 
 Parameter estimate(s):
Phi1 
   0 
Fixed effects:  monarch_change ~ avg_wind_speed_scaled + avg_wind_gust_scaled +      max_wind_gust_scaled 
                           Value Std.Error DF    t-value p-value
(Intercept)           -0.4428602 0.2483875 93 -1.7829406  0.0779
avg_wind_speed_scaled -0.5588815 2.0504707 93 -0.2725625  0.7858
avg_wind_gust_scaled   0.3342085 2.1447222 93  0.1558283  0.8765
max_wind_gust_scaled   0.4028745 0.4293107 93  0.9384216  0.3505
 Correlation: 
                      (Intr) avg_wnd_s_ avg_wnd_g_
avg_wind_speed_scaled  0.000                      
avg_wind_gust_scaled   0.000 -0.981               
max_wind_gust_scaled   0.000  0.198     -0.349    

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.9820287 -0.7706652 -0.1926982  0.9128569  2.2988928 

Number of Observations: 99
Number of Groups: 3 

My key takeaway here is that wind does not appear to predict changes in monarch abundance from moment to moment (in this case, every 30 minutes). The same was true for all other models, including the 120 minute, where the p-values were unambigously large. Still, this is only a fraction of the dataset and the model performance still has many issues. These preliminary results agree, however, with what I have been seeing in the video, where clusters don’t obviously respond to wind patterns. Exposure to direct sunlight is another story, which I can tackle at another time.

RMSE Comparison

After searching around for an alternative approach, I came across the idea of comparing root mean square error. This has the advantage of giving a direct estimate of how much information is lost at each subsampling level. The graph below shows the relative loss of information compared to the baseline for each interval. Happily, it appears that 30 minutes results in the lowest change of less than 5% of the original information lost. The RMSE increases rapidly after that. So from my view, 30 minutes seems like the sweet spot, and certainly defensible. Looking forward to your thoughts.

Please note, that I used an “observation period” for this analysis, which groups by day per deployment. So all of the groups are continuous stretches of observations without the added complexity of night time photos being removed.

Relative root mean square error (RMSE) for each subsampling interval compared to the full dataset.

Appendix

I’m including here a lot of time series plots to give you a better sense of the raw data. Each observation period (e.g. SC1 on November 19th) has it’s own figure, with all subsampling intervals plotted. Mostly as reference.

SC1 Deployment

SC1 November 17th, 2023

SC1 November 18th, 2023

SC1 November 19th, 2023

SC1 November 20th, 2023

SC2 Deployment

SC2 November 17th, 2023

SC2 November 18th, 2023

SC2 November 19th, 2023

SC4 Deployment

SC4 December 3rd, 2023

SC4 December 4th, 2023